Skip to content

Conversation

@ldematte
Copy link
Contributor

@ldematte ldematte commented Nov 4, 2025

CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors.

This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm.

The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.

@ldematte ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.3.0 branch:9.2 labels Nov 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed branch:9.2 labels Nov 4, 2025
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ldematte ldematte removed the test-gpu Run tests using a GPU label Nov 5, 2025
@ldematte ldematte enabled auto-merge (squash) November 5, 2025 11:09
@ldematte ldematte merged commit 136677b into elastic:main Nov 5, 2025
34 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.2

ldematte added a commit to ldematte/elasticsearch that referenced this pull request Nov 5, 2025
CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors.

This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm.

The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.
elasticsearchmachine pushed a commit that referenced this pull request Nov 5, 2025
CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors.

This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm.

The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.
@ldematte ldematte deleted the gpu/improve-resource-manager branch November 5, 2025 14:37
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Nov 10, 2025
CuVSResourcesManager has the purpose of controlling access to resources to ensure a correct level of parallelism (allowing more than 1 GPU thread, but having a reasonable upper bound) and controlling the amount of GPU memory needed to prevent CUDA out-of-memory errors.

This PR extends the memory control part by introducing different strategies for memory accounting ("real", based on API calls to the device, and "tracking", which remembers the amount of memory requested during acquisition) and different estimations based on the CAGRA graph build algorithm.

The former will allow us to use pooled memory (where the amount of available memory will be different from the free device memory), the latter to use the IVFPQ CAGRA graph build algorithm for larger datasets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants